When Stream Processing crosses MapReduce
نویسندگان
چکیده
Although Event Stream Processing (ESP) systems exit for already more than a decade, we recently witness a true renaisance for ESP systems that have adopted the popular MapReduce paradigm. In this white paper, we advocate for the StreamMapReduce approach as it allows a (i) quick and easy transition of legacy MapReduce-based applications to ESP, (ii) simplifies the implementation of fault tolerance mechanisms, and (iii) elasticity in order to operate in nowadays cloud environments. We will furthermore showcase two real world applications from the area of SmartGrids and geo-spatial data stream analysis where the StreamMapReduce approach has been successfully applied. Keywords—ESP, event stream processing, programming model, stateful event processing, fault tolerance, elasticity
منابع مشابه
Stream Processing with Bigdata by SSS-MapReduce
We propose a MapReduce based stream processing system, called SSS, which is capable of processing stream along with large scale static data. Unlike the existing stream processing systems that can work only on the relatively small on-memory data-set, SSS can process incoming streamed data consulting the stored data. SSS processes streamed data with continuous Mappers and Reducers, that are perio...
متن کاملStream Processing in the Cloud
Stock exchanges, sensor networks and other publish/subscribe systems need to deal with highvolume streams of real-time data. Especially financial data has to be processed with low latency in order to cater for high-frequency trading algorithms. In order to deal with the large amounts of incoming data, the stream processing task has to be distributed. Traditionally, distributed stream processing...
متن کاملPig Squeal: Bridging Batch and Stream Processing Using Incremental Updates
Title of dissertation: Pig Squeal: Bridging Batch and Stream Processing Using Incremental Updates James Holmes Lampton, Jr., Doctor of Philosophy, 2015 Dissertation directed by: Professor Ashok Agrawala Department of Computer Science As developers shift from batch MapReduce to stream processing for better latency, they are faced with the dilemma of changing tools and maintaining multiple code b...
متن کاملMapReduce Online
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire output of each map and reduce task before it can be consumed. In this paper, we propose a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model bey...
متن کاملAd-hoc data processing in the cloud
Ad-hoc data processing has proven to be a critical paradigm for Internet companies processing large volumes of unstructured data. However, the emergence of cloud-based computing, where storage and CPU are outsourced to multiple third-parties across the globe, implies large collections of highly distributed and continuously evolving data. Our demonstration combines the power and simplicity of th...
متن کامل